This is the sample code for the Game Developer article
"Adaptive Compression Across Unreliable Networks".

Summary
-------
The code simulates a server sending a lot of data to a
client, where some of the data packets can be lost.
Nevertheless, the server and client manage to keep
their data statistics in sync, so that adaptive
compression can be performed.  The system takes
advantage of the shared knowledge in the system
to allow updates of the data statistics without ever
transmitting those statistics over the network.
(Doing so would be inefficient).

This code is an extension of the ideas in "Using an
Arithmetic Coder, Part 2".  

First, we use the file "training.txt" to build a
static data model on both the client and server.  Then
the server begins sending "to_transmit.txt" to the client.
The file "to_transmit.txt" is fairly long; it consists
of two chapters of Paradise Lost, followed by the FAQ
for the newsgroup comp.graphics.algorithms, followed
by the source of an earlier version of this code.

Run the code without arguments; it's hardcoded to know
the names of the input and output files.


Details
-------

In addition to the bit-shifting versions of pack() and
unpack() from Part 2, this code also employs the
equal-probability version from Part 1.  This is useful
because we can pack numbers with ranges that exceed
the denominator of our usual probability fraction.  (In this
code, we transmit message IDs of up to 6000, but our
probability denominator is only 4096).  Also, this 
prevents us from needing to allocate long arrays just
to store equal-probability approximations.

If you try to use this code to transmit extremely long
messages, the code will break, due to the hardcoded limit
of 6000 on message IDs.  This would be easy to fix for a
production system.  (See explanation in config.h).

The packet loss that's reported does not take into account
any duplicated packets (it would count them as non-duplicates).
However, this simulation does not duplicate packets, so that is
not a problem.  Think about this if you choose to build a similar
system into your own code, though.

rand() is a poor-quality random number generator, and as a
result, the packet loss actually generated by the system will
usually be significantly different from what you set it to
in configh.h.  If you're aiming to measure compression effectiveness
at a given level of packet loss, then you must tweak the config.h
number until you get the appropriate amount of loss reported in
the output.  Alternately, you could just put a better random
number generator into this code.

Right now, when we lose a packet, the accumulator for that
block's message data hangs around forever.  This is not a good
thing, since in a long enough session, you'd run out of memory.
You can add a simple routine that reaps accumulators when they
become too old.


Future Work
-----------
After a new data model is synthesized, this code uses only that
model; the server never go back and use previous models (though
the client still wants to keep some around, in case older packets
arrive out of order).  To improve compression at a significant
CPU cost, the server could try compressing a packet with a number 
of recent models.  Since the data model is already tagged onto each 
packet, this would just work.  However, you will want to take care
for the server not to reuse old models that are about to be
re-computed, since this would cause a situation where the client
has updated a given data model and receives a packet encoded with
an old model that used that same index.  This would be a mess.


   -Jonathan Blow (jon@number-none.com)
    Monday, July 21, 2003
    Flight Path Coffee House, Austin, Texas
